76 research outputs found

    Budgeted Reinforcement Learning in Continuous State Space

    Get PDF
    A Budgeted Markov Decision Process (BMDP) is an extension of a Markov Decision Process to critical applications requiring safety constraints. It relies on a notion of risk implemented in the shape of a cost signal constrained to lie below an - adjustable - threshold. So far, BMDPs could only be solved in the case of finite state spaces with known dynamics. This work extends the state-of-the-art to continuous spaces environments and unknown dynamics. We show that the solution to a BMDP is a fixed point of a novel Budgeted Bellman Optimality operator. This observation allows us to introduce natural extensions of Deep Reinforcement Learning algorithms to address large-scale BMDPs. We validate our approach on two simulated applications: spoken dialogue and autonomous driving.Comment: N. Carrara and E. Leurent have equally contribute

    Online learning and transfer for user adaptation in dialogue systems

    Get PDF
    International audienceWe address the problem of user adaptation in Spoken Dialogue Systems. The goal is to quickly adapt online to a new user given a large amount of dialogues collected with other users. Previous works using Transfer for Reinforcement Learning tackled this problem when the number of source users remains limited. In this paper, we overcome this constraint by clustering the source users: each user cluster, represented by its centroid, is used as a potential source in the state-of-the-art Transfer Reinforcement Learning algorithm. Our benchmark compares several clustering approaches , including one based on a novel metric. All experiments are led on a negotiation dialogue task, and their results show significant improvements over baselines

    Budgeted Reinforcement Learning in Continuous State Space

    Get PDF
    International audienceA Budgeted Markov Decision Process (BMDP) is an extension of a Markov Decision Process to critical applications requiring safety constraints. It relies on a notion of risk implemented in the shape of a cost signal constrained to lie below an-adjustable-threshold. So far, BMDPs could only be solved in the case of finite state spaces with known dynamics. This work extends the state-of-the-art to continuous spaces environments and unknown dynamics. We show that the solution to a BMDP is a fixed point of a novel Budgeted Bellman Optimality operator. This observation allows us to introduce natural extensions of Deep Reinforcement Learning algorithms to address large-scale BMDPs. We validate our approach on two simulated applications: spoken dialogue and autonomous driving

    SELECTIVE HYDROGENOLYSIS OF GLYCEROL TO PROPYLENE GLYCOL IN A CONTINUOUS FLOW TRICKLE BED REACTOR USING COPPER CHROMITE AND Cu/Al2O3 CATALYSTS

    Get PDF
    The glycerol hydrogenolysis reaction was performed in a continuous flow trickle bed reactor using a water glycerol feed and both copper chromite and Cu/Al2O3 catalysts. The commercial copper chromite had a higher activity than the laboratory prepared Cu/Al2O3 and was used for most of the tests. Propylene glycol was the main product with both catalysts, acetol being the main by-product. It was found that temperature is the main variable influencing the conversion of glycerol. When the state of the glycerol-water reactant mixture was completely liquid, at temperatures lower than 190 degrees C, conversion was low and deactivation was observed. At reaction temperatures of 210-230 degrees C the conversion of glycerol was complete and the selectivity to propylene glycol was stable at about 60-80% all throughout the reaction time span of 10 h, regardless of the hydrogen pressure level (1 to 20 atm). These optimal values could not be improved significantly by using other different reaction conditions or increasing the catalyst acidity. At higher temperatures (245-250 degrees C) the conversion was also 100%. Under reaction conditions at which copper chromite suffered deactivation, light by-products and surface deposits were formed. The deposits could be completely burned at 250 degrees C and the catalyst activity fully recovered.CAPES (Brazil)MinCyT (Argentina)CAPES/MINCyTFAPESPUNL, CONICET, INCAPE, Inst Invest Catalisis & Petroquim, Santiago Estero 2654, RA-3000 Santa Fe, ArgentinaUniv Fed ABC, Ctr Ciencias Nat & Humanas, Rua Santa Adelia 166, BR-09210070 Santo Andre, SP, BrazilUniv Fed Sao Paulo, Inst Ciencias & Tecnol, Rua Talim 330, Sao Jose Dos Campos, SP, BrazilUniv Fed Sao Paulo, Inst Ciencias & Tecnol, Rua Talim 330, Sao Jose Dos Campos, SP, BrazilCAPES/MINCyT: 208/12FAPESP: 2011/22264-4Web of Scienc

    Metal and precursor effect during 1-heptyne selective hydrogenation using an activated carbon as support

    Get PDF
    Palladium, platinum, and ruthenium supported on activated carbon were used as catalysts for the selective hydrogenation of 1-heptyne, a terminal alkyne. All catalysts were characterized by temperature programmed reduction, X-ray diffraction, transmission electron microscopy, and X-ray photoelectron spectroscopy. TPR and XPS suggest that the metal in all catalysts is reduced after the pretreatment with H2 at 673 K. The TPR trace of the PdNRX catalyst shows that the support surface groups are greatly modified as a consequence of the use of HNO3 during the catalyst preparation. During the hydrogenation of 1-heptyne, both palladium catalysts were more active and selective than the platinum and ruthenium catalysts. The activity order of the catalysts is as follows: PdClRX > PdNRX > PtClRX ≫ RuClRX. This superior performance of PdClRX was attributed in part to the total occupancy of the d electronic levels of the Pd metal that is supposed to promote the rupture of the H2 bond during the hydrogenation reaction. The activity differences between PdClRX and PdNRX catalysts could be attributed to a better accessibility of the substrate to the active sites, as a consequence of steric and electronic effects of the superficial support groups. The order for the selectivity to 1-heptene is as follows: PdClRX = PdNRX > RuClRX > PtClRX, and it can be mainly attributed to thermodynamic effects.UNL and CONICET

    A Fitted-Q Algorithm for Budgeted MDPs

    Get PDF
    Workshop on Safety, Risk and Uncertainty in Reinforcement Learning. https://sites.google.com/view/rl-uai2018/We address the problem of bud-geted/constrained reinforcement learning in continuous state-space using a batch of transitions. For this purpose, we introduce a novel algorithm called Budgeted Fitted-Q (BFTQ). We carry out some preliminary benchmarks on a continuous 2-D world. They show that BFTQ performs as well as a penalized Fitted-Q algorithm while also allowing ones to adapt the trained policy on-the-fly for a given amount of budget and without the need of engineering the reward penalties. We believe that the general principles used to design BFTQ could be used to extend others classical reinforcement learning algorithms to budget-oriented applications

    Safe transfer learning for dialogue applications

    Get PDF
    International audienceIn this paper, we formulate the hypothesis that the first dialogues with a new user should be handle in a very conservative way, for two reasons : avoid user dropout; gather more successful dialogues to speedup the learning of the asymptotic strategy. To this extend, we propose to transfer a safe strategy to initiate the first dialogues

    Budgeted Reinforcement Learning in Continuous State Space

    Get PDF
    International audienceA Budgeted Markov Decision Process (BMDP) is an extension of a Markov Decision Process to critical applications requiring safety constraints. It relies on a notion of risk implemented in the shape of a cost signal constrained to lie below an-adjustable-threshold. So far, BMDPs could only be solved in the case of finite state spaces with known dynamics. This work extends the state-of-the-art to continuous spaces environments and unknown dynamics. We show that the solution to a BMDP is a fixed point of a novel Budgeted Bellman Optimality operator. This observation allows us to introduce natural extensions of Deep Reinforcement Learning algorithms to address large-scale BMDPs. We validate our approach on two simulated applications: spoken dialogue and autonomous driving

    Towards long-term standardised carbon and greenhouse gas observations for monitoring Europe's terrestrial ecosystems : a review

    Get PDF
    Research infrastructures play a key role in launching a new generation of integrated long-term, geographically distributed observation programmes designed to monitor climate change, better understand its impacts on global ecosystems, and evaluate possible mitigation and adaptation strategies. The pan-European Integrated Carbon Observation System combines carbon and greenhouse gas (GHG; CO2, CH4, N2O, H2O) observations within the atmosphere, terrestrial ecosystems and oceans. High-precision measurements are obtained using standardised methodologies, are centrally processed and openly available in a traceable and verifiable fashion in combination with detailed metadata. The Integrated Carbon Observation System ecosystem station network aims to sample climate and land-cover variability across Europe. In addition to GHG flux measurements, a large set of complementary data (including management practices, vegetation and soil characteristics) is collected to support the interpretation, spatial upscaling and modelling of observed ecosystem carbon and GHG dynamics. The applied sampling design was developed and formulated in protocols by the scientific community, representing a trade-off between an ideal dataset and practical feasibility. The use of open-access, high-quality and multi-level data products by different user communities is crucial for the Integrated Carbon Observation System in order to achieve its scientific potential and societal value.Peer reviewe
    corecore